A 17-Gene Expression Signature for Early Identification of Poor Prognosis in Clear Cell Renal Cell Carcinoma

Simple Summary Our analysis of a 17-gene expression signature resulted in being significantly different among patients with clear cell renal cancer cell (ccRCC) who reported a recurrence-free survival (RFS) >5 years and patients with a RFS < 1 year. This Genomic Signatures could be useful to better identify good prognosis (with favorable genomic signature) against poor prognosis (with unfavorable genomic signature) ccRCC. Accordingly, both follow-up and treatment could be profoundly personalized for patients with neodiagnosis of ccRCC in the near future. Abstract The Identification of reliable Biomarkers able to predict the outcome after nephrectomy of patients with clear cell renal cell carcinoma (ccRCC) is an unmet need. The gene expression analysis in tumor tissues represents a promising tool for better stratification of ccRCC subtypes and patients’ evaluation. Methods: In our study we retrospectively analyzed using Next-Generation expression analysis (NanoString), the expression of a gene panel in tumor tissue from 46 consecutive patients treated with nephrectomy for non-metastatic ccRCC at two Italian Oncological Centres. Significant differences in expression levels of selected genes was sought. Additionally, we performed a univariate and a multivariate analysis on overall survival according to Cox regression model. Results: A 17-gene expression signature of patients with a recurrence-free survival (RFS) < 1 year (unfavorable genomic signature (UGS)) and of patients with a RFS > 5 years (favorable genomic signature (FGS)) was identified and resulted in being significantly correlated with overall survival of the patients included in this analysis (HR 51.37, p < 0.0001). Conclusions: The identified Genomic Signatures may serve as potential biomarkers for prognosis prediction of non-metastatic RCC and could drive both follow-up and treatment personalization in RCC management.


Introduction
Renal Cell Carcinoma (RCC) accounts for approximately 3% of all malignancies [1]. In Italy, it causes more than 3.717 deaths/year and the incidence of new cases is estimated at approximately 12,600 new cases/year [2]. The most frequent (70-85%) histologic subtype is the clear cell renal cell carcinoma (ccRCC), a highly vascular tumor arising from the proximal tubules of [3]. The treatment of choice for early-stage disease is radical or partial nephrectomy; however, about 50% of subjects with clinically localized disease will eventually relapse, and two-thirds of them will recur within the first year [4][5][6]. Further, effective adjuvant therapies have not been established yet, since VEGF TKIs have failed in high-risk (pT3, pT4, node-positive), resected RCC, and the ASSURE [7,8], S-TRAC [9,10] and PROTECT trials [11]. Using Sorafenib, Sunitinib, and Pazopanib in an adjuvant setting did not reveal an improvement in DFS and OS. The double-blind, phase 3 trial, KEYNOTE-564, showed that patients with clear-cell renal-cell carcinoma who were at high risk for recurrence after nephrectomy, with or without metastasectomy, who received adjuvant pembrolizumab therapy had significantly longer disease-free survival than placebo (diseasefree survival at 24 months, 77.3% vs. 68.1%; hazard ratio for recurrence or death, 0.68; 95% confidence interval [CI], 0.53 to 0.87; p = 0.002 [two-sided]) [12]. According to this data on the role of pembrolizumab as adjuvant treatment, the identification of a prognostic gene signature could be very helpful for better selection of those patients at higher risk of recurrence who may benefit from adjuvant treatment.
Presently, two clinical models can be used to evaluate the risk of ccRCC progression: the Mayo Clinic Stage, Size, Grade, and Necrosis (SSIGN) [13] score and the University of California Los Angeles Integrated Staging System (UISS) [14]. The SSIGN system analyzes histological features as TNM tumor stage (p < 0.001), size ≥ 5 cm (p < 0.001), nuclear grade (p < 0.001), and tumor necrosis (p < 0.001) and assigns a risk score from 3 to 10, the higher the score, the shorter the median survival.
The UISS model evaluates histological parameters like TNM stage, Fuhrman grade, and Eastern Cooperative Oncology Group performance status and stratifies patients with localized RCC in low, intermediate, and poor risk groups with 5-year survival rates of 92%, 67%, and 44%, respectively.
The ccRCC is characterized by high molecular heterogeneity, as pointed out by the number of involved driver genes. Even if the most prevalent loss-of-function mutation in ccRCC affects the von Hippel-Lindau (VHL, 44-90% of cases) [15], gene mutations have also been identified in PBRM1 (32-41%), BAP1 (6-15%), SETD2 (3-11%), TP53, (5%), KDM5C (3-5%), PIK3CA (3%), TSC1 (3%), ARID1A (2%), and CDKN2A (2%). Dissection of the molecular heterogeneity characterizing the tumor tissues is, thus, of paramount importance to explain the landscape of clinical manifestations, progression risk, and differential response to pharmacological therapies. In particular, gene expression analysis in tumor tissues represents a promising tool for better stratification of ccRCC subtypes and patients' evaluation. Previous studies reported ccRCC gene expression signatures associated with prognosis, as the ClearCode34 [15][16][17], or associated with recurrence risk and therapy response [18,19]. The molecular markers identified by different studies differ according to the patient selection criteria (tumor type, stage, grade, therapies) and to the Cancers 2022, 14, 178 3 of 11 selected outcome (survival, progression, recurrence). Therefore, presently there are no validated gene expression prognostic biomarkers applicable in all ccRCC setting. Further, to enhance clinical adoption of sophisticated molecular diagnostic panels, such as gene expression analysis, a better performance compared to histological/clinical evaluation should be demonstrated.
In recent years, a novel, medium-throughput (up to 800 genes analyzed simultaneously) technology for gene expression analysis, that is, the Nanostring nCounter system [20], allows implementing gene signatures in clinical practice [21]. We developed a Nanostring 195-plexed gene expression panel for ccRCC evaluation, including probes for (1) the most relevant (to our knowledge) prognostic gene signatures reported in literature, (2) genes frequently mutated in ccRCC, and (3) genes reported as differentially expressed in ccRCC compared to normal tissue (see Materials and Methods). In this retrospective study, we employed this analytical panel to compare gene expression between non-metastatic ccRCC patients who had a recurrence-free survival (RFS) of < 1 year and non-metastatic ccRCC patients who had an RFS of >5 year.

Patients
This was an observational, case-control, retrospective, multicenter study, including 46 adult patients (age ≥ 18) referred to the Sant'Andrea Hospital "Sapienza" University of Rome and "Regina Elena" National Cancer Institute of Rome in the period 2012-2018. Patients were consecutively enrolled, according to the stage at the time of diagnosis (stage I-III ccRCC) and treated with nephrectomy (defined as "partial" or "radical") with no previous systemic therapy. After that surgery, no adjuvant treatment was done. Patients were excluded if biopsies' samples were derived from metastases. Follow-up of patients was performed according to the standard of care at the participating institutions. The SSIGN scoring system (1997 TNM stage, tumor size ≥ 5 cm nuclear grade, and histological tumor necrosis) was used to define the aggressiveness of ccRCC.
Patients were categorized according to the RFS, defined as the time from the date of surgery to the date of recurrence or last follow-up. The group with unfavorable genomic signature (UGS, N = 22) had an RFS of <1 year and the favorable genomic signature (FGS, N = 24) had an RFS of >5 years. Data collected from medical records and pathology reports included Karnofsky performance status (PS), diameters of the primary tumors, Fuhrman grade, lymph node involvement, tumor necrosis, sarcomatoid component, surgery (cytoreductive, partial or radical nephrectomy), tumor stage, date of radiographic or clinical progression, and date of death or last follow-up. Overall survival (OS) was calculated from the time of surgery to the date of death for any cause or last follow-up.
Ethical approval for this study was obtained by the local committees (Prot. n. 107 SA_2017 del 19.04.2017; RIF. CE: 4407) and patients provided written, informed consent.

Gene Expression Panel Selection
The full list of selected genes, with probes' sequences and isoform coverage, is reported in the Supplementary Table S1. The gene expression panel was designed to include genes previously identified as prognostics' factors [16,19], gene-defining molecular subtypes according to Beuselinck et al. [16], genes identified as differentially expressed between BAP1-and PBRM1-mutant tumors [22], genes involved in energy metabolism and in ccRCC [23], and genes from the ccRCC dataset from The Cancer Genome Atlas (TCGA) [24]. Nine housekeeping genes were also included.

RNA Extraction and Nanostring Analysis
Total RNA was extracted using the High Pure FFPET RNA Isolation Kit (Roche, Basilea, Switzerland), according to the manufacturer's protocol, and quantified using the NanoDrop2000 spectrophotometer (Thermo Fisher Scientific, Waltham, DE, USA). After evaluation of RNA size and integrity using the 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), samples were stored at −80 • C until analysis. RNA samples were excluded from the analyses if they had concentrations <25 ng/µL or a RNA Integrity Number value < 6.5.
Nanostring analysis was performed according to the manufacturer's instructions. Briefly, 5 µL of each sample were mixed with 8 µL of the hybridization cocktail containing the reporter code set. Then, 2 µL of the capture code set were added. Hybridization was performed in a 65 • C thermocycler (Veriti Thermal Cycler, Applied Biosystems, Foster City, CA, USA) for 18 h. The samples were then loaded onto the Nanostring cartridge using the automated Nanostring prep station, and the cartridge was scanned with the Nanostring Digital Analyzer to obtain probes' counts.

Sample Size Calculation
In order to determine sample size, we focused on the False Discovery Rate (FDR). This rate depends strongly on the formula (1-π)/α where α is the type I error and π is the proportion of the genes that are not differentially expressed between the two compared groups. The value of (1-π) is typically in the range of 0.005 to 0.05, but in our study we expected a higher rate because of the candidate gene approach we used and because the two groups are highly different in prognostic terms (patients with RFS of <1 year and patients with RFS of >5 years). The ssize.fdr package version 1.2 (R version 2.14.2) was used. Assuming an effect size of 0.80, a power of 80%, and a FDR = 0.05, a range of different sample sizes according to π were investigated. On the basis of this calculation and of availability of retrospective information and practical consideration, a total of 46 patients were included in the study (24 patients with RFS of < 1 year and 22 with RFS of > 5 years); this sample size allowed us to assess a π = 0.90. Differences between patients' characteristics in the two groups were assessed by chi-square test when related to categorical variables and by Mann-Whitney test when considering quantitative items. Cox proportional hazard model was used to estimate hazard ratios and their 95% confidence intervals.
The primary end point was the identification of an UGS associated with worse prognosis (RFS < 1 year) and of a FGS associated with a better prognosis (RFS > 5 years). The gene expression signature associated with poor prognosis was assessed by tumor characteristics, including the SSIGN (Stage, Size, Grade, and Necrosis) score. The SSIGN score, composed of four clinical assessment measures (7th version TNM classification system, tumor size ≥ 5 cm, nuclear grade, and histological tumor necrosis), was used to define aggressiveness of ccRCC.

Nanostring Data Analysis
The raw data file from the Nanostring Digital Analyzer was analyzed by the Nanostring nCounter nSolver™ 4.0 using the Nanostring Advanced Analysis Module 2.0 plugin. The Advanced Analysis Module 2.0 software uses open-source R programs for quality control (QC), normalization, and differential expression (DE) analysis. The analysis panel included six positive control probes with known expected counts' number and eight negative control probes to test for analytical quality of each experimental run. Thus, technical normalization was performed using nSolver™ 4.0, according to the Nanostring Gene Expression Data Analysis Guidelines (Nanostring MAN-C0011-04), before running the Advanced Analysis Module 2.0. Biological normalization was then achieved by selecting the best reference probes among the nine housekeeping genes included in the panel. These reference genes (ACTB, CLTC, VDAC2, PGK1, B2M) were selected using the geNorm function of the Advanced Analysis Module, which ranks housekeeping genes according to the minimum expression variance among samples. DE analysis between the PP group and the GP group was performed using batch and cartridge IDs factored as confounding factors.

Study Design
This study design considered two groups of patients who were very different in prognostic terms. In the first step, the gene finding and the standard t statistic were used for the ranking; the t statistic is calculated as the difference in the class-specific means of log expression divided by an estimate of the standard error of the difference. In order to control the "false discovery rate" (FDR), a stringent threshold of significance was used (p = 0.01). Considering a total of 195 genes to be investigated, the expected number of false positives was about two genes; the FDR was given by the ratio between 2 and the number of genes for which the univariate significance level was less than 0.01.
The second step consisted of the development of a predictive classifier. A class predictor, or predictive classifier, is a computable function that can be used to predict a class from an expression profile. As suggested by Simon, in developing this predictive classifier, the principal aim was the predictive accuracy, sensitivity, specificity, and positive and negative predictive values and not the goodness of fit to the data or the statistical significance of regression coefficients.
The third step was the validation of this classifier, and a cross-validation method was implemented based on the leave-one-out procedure. The cross-validated prediction error is an estimate of the prediction error associated with the application of the algorithm for model building to the entire dataset.
At the end of the entire process, the classifier was completely specified (including cutoffs) to allow validation in external independent sets.

Results
According to RFS, patients were divided in two groups: 24 patients in the group with a RFS of >5 years versus 22 patients with an RFS of < 1 year. The two study groups were well balanced in terms of sex, tumor grade, necrosis, and sarcomatoid component representation in the histology, whereas the median age, disease stage, Karnofsky PS, and surgery done were significantly different between the two groups (Table 1). Differential analysis of gene expression between the RFS ≤ 1 year and RFS > 5 years groups identified 17 genes that maintained significantly different expression levels after Bonferroni correction ( Table 2). This molecular signature was entirely down regulated in the RFS ≤ 1-year group compared to the RFS > 5-years group and was thus named as UGS.
Furthermore, we compared our risk assessment tool with existing clinical nomogram to predict death from ccRCC. SSIGN score distribution in two prognostic groups was very heterogeneous (Figure 1), suggesting an independent contribution of this signature to prognosis.  Furthermore, we compared our risk assessment tool with existing clinical nomogram to predict death from ccRCC. SSIGN score distribution in two prognostic groups was very heterogeneous (Figure 1), suggesting an independent contribution of this signature to prognosis. As expected, those patients who reported a RFS from the time of surgery of >5 years and a FGS had a longer OS than patients who reported a RFS of < 1 years and UGS. The univariate analysis on OS according to Cox regression model showed an association between Karnofsky Performance Status (p < 0.0001), tumor stage (p = 0.03), and SSIGN score (p = 0.003) and signature (p < 0.0001); no association was found for gender (p = 0.90), age (p = 0.11), grade (p = 0.78), necrosis (p = 0.75), and type of surgery (p = 0.08). (Table 3) However, at the multivariate analysis, when significant associations were tested simultaneously, and not considering signature, only Karnofsky (p < 0.0001) and SSIGN (p = 0.006) maintained their significance. When introducing signature, this was the only significant parameter (HR 51.34 (6.75-390.96); p < 0.0001)) ( Table 3). As expected, those patients who reported a RFS from the time of surgery of >5 years and a FGS had a longer OS than patients who reported a RFS of < 1 years and UGS. The univariate analysis on OS according to Cox regression model showed an association between Karnofsky Performance Status (p < 0.0001), tumor stage (p = 0.03), and SSIGN score (p = 0.003) and signature (p < 0.0001); no association was found for gender (p = 0.90), age (p = 0.11), grade (p = 0.78), necrosis (p = 0.75), and type of surgery (p = 0.08). (Table 3) However, at the multivariate analysis, when significant associations were tested simultaneously, and not considering signature, only Karnofsky (p < 0.0001) and SSIGN (p = 0.006) maintained their significance. When introducing signature, this was the only significant parameter (HR 51.34 (6.75-390.96); p < 0.0001)) ( Table 3).

Discussion
We identified a 17-gene expression signature to predict the outcome of ccRCC patients. The primary endpoint was the identification of an unfavorable genomic signature associated with worse prognosis (RFS of <1 year) and of a favorable genomic signature associated with a better prognosis (RFS of >5 years). The gene expression signature associated with poor prognosis was assessed by tumor characteristics, including the SSIGN score. In this analysis, we distinguished two different subtypes of ccRCC characterized by different outcomes. Overexpression of a gene involved in infection/inflammation, PI3K-Akt signaling, HIF-1 signaling, and pentose phosphate pathway, was associated with an increased risk of recurrence.
The TCGA data showed the correlation between disease aggressiveness and metabolic shift that involved increased dependence on pentose phosphate shunt, downregulation of AMP-activated protein kinase (AMPK), and the Krebs cycle and increased glutamine transport and fatty acid production [25]. Additionally, Zhang Q. and colleagues showed the association between overexpression of glucose 6-phosphate dehydrogenase (G6PD) with poor prognosis [26].
In our study, patients with UGS were characterized by overexpression of POLD4. Interestingly, the role of POLD4 is a gene involved in mismatch repair, base excision repair, DNA replication, homologous recombination, and nucleotide excision repair. POLD4 downregulation activates checkpoint proteins, induces G1-S arrest, and delays the cell cycle from S to G2. POLD4 reduction induces also modest genomic instability, while allowing cells to grow until DNA damage reaches an intolerant level [27,28]. Its role in ccRCC is unclear.
USG was characterized by a high expression of interlekin-6 (IL-6), a gene involved in infection/inflammation that seems to play a pro-tumorigenic role, contributing to proliferation and invasion of the tumor cell. Wang et al. [29] showed the correlation between a high level of IL-6 and poor survival in RCC patients. Multigenic assay by Rini included IL-6 in inflammatory pathway [19]. Some studies showed the association between a high level of IL-6 and tyrosine kinase inhibitors' resistance. IL6 is shown to be closely related to hypoxia inducible factor 1-α (HIF-1α) as well as increased VEGF activity. IL-6 binds IL-6 Receptor and results in activation of the JAK/STAT3 signaling pathway, leading to the transcription of STAT3 target genes, i.e., VEGF or SOCS3. SOCS3 suppresses STAT1 activation. STAT3 and NF-κB interact at multiple levels and promote inflammation, increasing tumor cell proliferation and survival as well as tumor angiogenesis and metastasis [30][31][32][33]. Motzer RJ presented at the 2020 ASCO annual virtual meeting the biomarker analyses from the phase 3 ChekMate 214 trial of nivolumab plus ipilimumab vs. sunitinib in clear-cell advanced renal cell carcinoma [34,35]. The angiogenesis gene signature was associated with ORR for sunitinib (high score) and Nivolumab + Ipilimumab (low score). Prolonged PFS with Nivolumab + Ipilimumab was associated with higher expression of Hallmark inflammatory response and Hallmark epithelial-mesenchymal transition gene sets. These results suggest the knowledge that our signature could also play an important role in the therapeutic choice (immunotherapy vs. TKI vs. TKI +immunotherapy).
As shown also with a ClearCode 34-based model, these genomic models were better predictors for recurrence than the SSIG Model [16,36].
Adjuvant treatment with the VEGF receptor tyrosine kinase inhibitors showed no survival benefit in resected, high-risk ccRCC. Therefore, our genomic signature can also explain the hypothesis that the biology of cancer recurrence might be independent of angiogenesis in this setting [37][38][39]: Micrometastases in the adjuvant setting do not require the support of tumor angiogenesis. Therefore, a VEGFR TKI would not eradicate micrometastases. Currently, the role of immune checkpoint inhibitors in an adjuvant setting shows promising results, but are still under investigation in ongoing trials.
Our 17-gene expression signature was significantly associated with RFS of < 1 year and showed greater superiority in predicting localized renal cell carcinoma recurrence when compared with a clinicopathological characteristics' scoring system (SSIGN score).
We acknowledge that our study has limitations: This was a retrospective trial. The population was small and characterized by two predefined distinct groups of patients selected a priori on recurrence-free survival of <1 years vs. >5 years.
However, our genomic signature enhances risk stratification. Future prospective studies, with large cohorts of patients, will be necessary to validate this signature.

Conclusions
This genomic panel could drive treatment personalization and support different surveillance programs to improve management of two distinct ccRCC prognostic groups.

Data Availability Statement:
The data are reported on medical records and available on request.

Conflicts of Interest:
The authors declare no conflict of interest.